Some Examinations of Intrinsic Methods for Summary Evaluation Based on the Text Summarization Challenge (TSC)
نویسندگان
چکیده
Computer-produced summaries have traditionally been evaluated by comparing them with human-produced summaries using the Fmeasure. However, the F-measure is not appropriate when alternative sentences are possible in a human-produced extract. In this paper, we examine some evaluation methods devised to overcome the problem, including utility-based evaluation. By giving scores for moderately important sentences that does not appear in the human-produced extract, utility-based evaluation can resolve the problem. However, the method requires much effort from humans to provide data for evaluation. In this paper, we first propose a pseudo-utilitybased evaluation that uses human-produced extracts at different compression ratios. To evaluate the effectiveness of pseudo-utility-based evaluation, we compare our method and the F-measure using the data of the Text Summarization Challenge (TSC), and show that pseudoutility-based evaluation can resolve this problem. Next, we focus on content-based evaluation. Instead of measuring the ratio of sentences that match exactly in the extract, the method evaluates extracts by comparing their content words to those of human-produced extracts. Although the method has been reported to be effective in resolving the problem, it has not been examined in the context of comparing two extracts produced from different systems. We evaluated computer-produced summaries by content-based evaluation, and compared the results with a subjective evaluation. We found that the evaluation by content-based measure matched those by subjective evaluation in 93% of the cases, if the gap in content-based scores between two summaries is more than 0.2.
منابع مشابه
A survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملComparison of Some Automatic and Manual Methods for Summary Evaluation Based on the Text Summarization Challenge 2
In this paper, we compare some automatic and manual methods for summary evaluation. One of the essential points for evaluating a summary is how well the evaluation measure recognizes slight differences in the quality of the computer-produced summaries. In terms of this point, we examined ‘evaluation by revision’ using the data of the Text Summarization Challenge 2 (TSC2). Evaluation by revision...
متن کاملGraph Hybrid Summarization
One solution to process and analysis of massive graphs is summarization. Generating a high quality summary is the main challenge of graph summarization. In the aims of generating a summary with a better quality for a given attributed graph, both structural and attribute similarities must be considered. There are two measures named density and entropy to evaluate the quality of structural and at...
متن کاملBiogeography-Based Optimization Algorithm for Automatic Extractive Text Summarization
Given the increasing number of documents, sites, online sources, and the users’ desire to quickly access information, automatic textual summarization has caught the attention of many researchers in this field. Researchers have presented different methods for text summarization as well as a useful summary of those texts including relevant document sentences. This study select...
متن کاملAn Automatic Method for Summary Evaluation Using Multiple Evaluation Results by a Manual Method
To solve a problem of how to evaluate computer-produced summaries, a number of automatic and manual methods have been proposed. Manual methods evaluate summaries correctly, because humans evaluate them, but are costly. On the other hand, automatic methods, which use evaluation tools or programs, are low cost, although these methods cannot evaluate summaries as accurately as manual methods. In t...
متن کامل